Variable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship
نویسندگان
چکیده
A wrapper variable selection procedure is proposed for use with learning machines that generate a measure of variable importance, such as Random Forest. The procedure is based on iteratively removing low-ranking variables and assessing the learning machine performance by cross-validation. The procedure is implemented for Random Forest on some QSAR modeling examples from drug discovery and development. It is shown that the non-recursive version of the procedure outperforms the recursive version, and that the default Random Forest mtry function is usually adequate. The paper concludes with some comments about performance assessment and the dangers of using Random Forest’s outof-bag error estimate in a variable selection wrapper.
منابع مشابه
Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules
Leo Breiman’s Random Forest ensemble learning procedure is applied to the problem of Quantitative Structure-Activity Relationship (QSAR) modeling for pharmaceutical molecules. This entails using a quantitative description of a compound’s molecular structure to predict that compound’s biological activity as measured in an in vitro assay. Without any parameter tuning, the performance of Random Fo...
متن کاملApplication of Genetic Algorithms for Pixel Selection in MIA-QSAR Studies on Anti-HIV HEPT Analogues for New Design Derivatives
Quantitative structure-activity relationship (QSAR) analysis has been carried out with a series of 107 anti-HIV HEPT compounds with antiviral activity, which was performed by chemometrics methods. Bi-dimensional images were used to calculate some pixels and multivariate image analysis was applied to QSAR modelling of the anti-HIV potential of HEPT analogues by means of multivariate calibration,...
متن کاملApplication of Genetic Algorithms for Pixel Selection in MIA-QSAR Studies on Anti-HIV HEPT Analogues for New Design Derivatives
Quantitative structure-activity relationship (QSAR) analysis has been carried out with a series of 107 anti-HIV HEPT compounds with antiviral activity, which was performed by chemometrics methods. Bi-dimensional images were used to calculate some pixels and multivariate image analysis was applied to QSAR modelling of the anti-HIV potential of HEPT analogues by means of multivariate calibration,...
متن کاملQuantitative Structure-Activity Relationship Studies of 4-Imidazolyl- 1,4-dihydropyridines as Calcium Channel Blockers
Objective(s): The structure- activity relationship of a series of 36 molecules, showing L-type calcium channel blocking was studied using a QSAR (quantitative structure–activity relationship) method. Materials and Methods: Structures were optimized by the semi-empirical AM1 quantum-chemical method which was also used to find structure-calcium channel blocking activity trends. Several types of ...
متن کاملQuantitative structure-activity relationship (QSAR) study of CCR2b receptor inhibitors using SW-MLR and GA-MLR approaches
In this paper, the quantitative structure activity-relationship (QSAR) of the CCR2b receptor inhibitors was scrutinized. Firstly, the molecular descriptors were calculated using the Dragon package. Then, the stepwise multiple linear regressions (SW-MLR) and the genetic algorithm multiple linear regressions (GA-MLR) variable selection methods were subsequently employed to select and implement th...
متن کامل